Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Large language models (LLMs) have demonstrated revolutionary capabilities in understanding complex contexts and performing a wide range of tasks. However, LLMs can also answer questions that are unethical or harmful, raising concerns about their applications. To regulate LLMs' responses to such questions, a training strategy called alignment can help. Yet, alignment can be unexpectedly compromised when fine-tuning an LLM for downstream tasks. This paper focuses on recovering the alignment lost during fine-tuning. We observe that there are two distinct directions inherent in an aligned LLM: the aligned direction and the harmful direction. An LLM is inclined to answer questions in the aligned direction while refusing queries in the harmful direction. Therefore, we propose to recover the harmful direction of the fine-tuned model that has been compromised. Specifically, we restore a small subset of the fine-tuned model's weight parameters from the original aligned model using gradient descent. We also introduce a rollback mechanism to avoid aggressive recovery and maintain downstream task performance. Our evaluation on 125 fine-tuned LLMs demonstrates that our method can reduce their harmful rate (percentage of answering harmful questions) from 33.25% to 1.74%, without sacrificing task performance much. In contrast, the existing methods either only reduce the harmful rate to a limited extent or significantly impact the normal functionality. Our code is available at https://github.com/kangyangWHU/LLMAlignmentmore » « lessFree, publicly-accessible full text available May 12, 2026
-
Free, publicly-accessible full text available December 4, 2025
-
The open sourcing of large amounts of image data promotes the development of deep learning techniques. Along with this comes the privacy risk of these image datasets being exploited by unauthorized third parties to train deep learning models for commercial or illegal purposes. To avoid the abuse of data, a poisoning-based technique, unlearnable example, has been proposed to significantly degrade the generalization performance of models by adding imperceptible noise to the data. To further enhance its robustness against adversarial training, existing works leverage iterative adversarial training on both the defensive noise and the surrogate model. However, it still remains unknown whether the robustness of unlearnable examples primarily comes from the effect of enhancement in the surrogate model or the defensive noise. Observing that simply removing the adversarial perturbation on the training process of the defensive noise can improve the performance of robust unlearnable examples, we identify that solely the surrogate model's robustness contributes to the performance. Furthermore, we found a negative correlation exists between the robustness of defensive noise and the protection performance, indicating defensive noise's instability issue. Motivated by this, to further boost the robust unlearnable example, we introduce Stable Error-Minimizing noise (SEM), which trains the defensive noise against random perturbation instead of the time-consuming adversarial perturbation to improve the stability of defensive noise. Through comprehensive experiments, we demonstrate that SEM achieves a new state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet Subset regarding both effectiveness and efficiency.more » « less
-
Federated learning is a promising paradigm that allows multiple clients to collaboratively train a model without sharing the local data. However, the presence of heterogeneous devices in federated learning, such as mobile phones and IoT devices with varying memory capabilities, would limit the scale and hence the performance of the model could be trained. The mainstream approaches to address memory limitations focus on width-slimming techniques, where different clients train subnetworks with reduced widths locally and then the server aggregates the subnetworks. The global model produced from these methods suffers from performance degradation due to the negative impact of the actions taken to handle the varying subnetwork widths in the aggregation phase. In this paper, we introduce a memory-adaptive depth-wise learning solution in FL called FEDEPTH, which adaptively decomposes the full model into blocks according to the memory budgets of each client and trains blocks sequentially to obtain a full inference model. Our method outperforms state-of-the-art approaches, achieving 5% and more than 10% improvements in top-1 accuracy on CIFAR-10 and CIFAR-100, respectively. We also demonstrate the effectiveness of depth-wise fine-tuning on ViT. Our findings highlight the importance of memory-aware techniques for federated learning with heterogeneous devices and the success of depth-wise training strategy in improving the global model’s performance.more » « less
-
The human estrogen receptor α (hER α ) is involved in the regulation of growth, development, and tissue homeostasis. Agonists that bind to the receptor’s ligand-binding domain (LBD) lead to recruitment of coactivators and the enhancement of gene expression. In contrast, antagonists bind to the LBD and block the binding of coactivators thus decreasing gene expressions. In this work, we carry out simulations using the AWSEM (Associative memory, Water mediated, Structure and Energy Model)-Suite force field along with the 3SPN.2C force field for DNA to predict the structure of hER α and study its dynamics when binding to DNA and coactivators. Using simulations of antagonist-bound hER α and agonist-bound hER α by themselves and also along with bound DNA and coactivators, principal component analyses and free energy landscape analyses capture the pathway of domain–domain communication for agonist-bound hER α . This communication is mediated through the hinge domains that are ordinarily intrinsically disordered. These disordered segments manipulate the hinge domains much like the strings of a marionette as they twist in different ways when antagonists or agonists are bound to the ligand-binding domain.more » « less
-
Bacteriophage T7 gp4 helicase has served as a model system for understanding mechanisms of hexameric replicative helicase translocation. The mechanistic basis of how nucleoside 5′-triphosphate hydrolysis and translocation of gp4 helicase are coupled is not fully resolved. Here, we used a thermodynamically benchmarked coarse-grained protein force field, Associative memory, Water mediated, Structure and Energy Model (AWSEM), with the single-stranded DNA (ssDNA) force field 3SPN.2C to investigate gp4 translocation. We found that the adenosine 5′-triphosphate (ATP) at the subunit interface stabilizes the subunit–subunit interaction and inhibits subunit translocation. Hydrolysis of ATP to adenosine 5′-diphosphate enables the translocation of one subunit, and new ATP binding at the new subunit interface finalizes the subunit translocation. The LoopD2 and the N-terminal primase domain provide transient protein–protein and protein–DNA interactions that facilitate the large-scale subunit movement. The simulations of gp4 helicase both validate our coarse-grained protein–ssDNA force field and elucidate the molecular basis of replicative helicase translocation.more » « less
An official website of the United States government

Full Text Available